Volume 93, Number 3, May-June 1988 

Journal of Research of the National Bureau of Standards 



Accuracy in Trace Analysis 



100 



o 

< 



CO 

LiJ 



cn 

UJ 

u 
o 

Q. 



80 - 



C 60- 



h- 

3 



40 - 



20- 



Test Yield 




Defect Rate 




sS 100 



5 10 15 20 25 

FREQUENCY OF ERRORS (f,%) 



CO 
CO 
UJ 

o 
o 
a: 

GL 



20 



Test Yield 




-Multi-rule, N = 4 
Defect Role \('2s ,N=2 

Us,N=8 




FREQUENCY OF ERRORS (f,%) 



Figttre 1. Comparison of quality (defect rate) and productivity 
(test yield) for a batch analytical process using a multi-rule con- 
trol procedure with A'' = 2 and A' = 4. From reference [I]. 



Figure 3. Comparison of quality (defect rate) and productivity 
(test yield) for a batch analytical process with different control 
rules and different N's. From reference [1]. 
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Figure 2. Comparison of quality (defect rate) and productivity 
(test yield) for a batch analytical process with different control 
rules all using N = 2 (2 control measurements per run). From 
reference [I], 



1. Introduction 

It is well known that there are important factors 
affecting accuracy in trace analysis, such as han- 
dling loss, contamination and purity of reagents. 
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The contention of this paper is that the statistical 
model for analytical error is another important fac- 
tor, that currently is receiving much attention. A 
normal distribution is the model upon which the 
statistical procedures used in laboratory quality 
control (QC) customarily are based. However, ex- 
aminations of certain analytical procedures and re- 
sults of trace analyses reveal features that are 
inconsistent with normality. 



2. Alternate Models 

The three models to be discussed are the normal 
distribution, the lognormal distribution, and no dis- 
tribution. There are good reasons for expecting one 
of these models to be appropriate in many circum- 
stances. 

The normal distribution is symmetric, bell- 
shaped, and of unbounded range. It is character- 
ized by two independent parameters, its mean (ju.) 
and standard deviation {cr), which can be directly 
associated with analytical bias and precision. One 
reason for the widespread applicability of this 
model is the Central Limit Theorem, the practical 
result of which is that sums of random variables 
tend to be normally distributed under mild condi- 
tions. Based on the Central Limit Theorem, it has 
been said that "in the case of a well devised analyt- 
ical system of measurement and a properly per- 
formed analysis . . . analytical results will be 
normally distributed or, at least, almost so" [1]. 
This statement assumes that analytical errors are 
additive. 

The lognormal distribution [2] is asymmetric, 
bounded below by zero, and is defined by the func- 
tion 

f(;c ) = exp[ - (log(jc ) - a) V2 S']/{277-S^x ) '^' 

The distribution takes its name from the fact that 
if X is lognormally distributed, log(:ic) is normally 
distributed. The mean and standard deviation of 
the distribution are jLi=exp(a-|-l/28^) and 
o-=/i'(exp(S^)— 1)'''^; thus the coefficient of varia- 
tion (Cy) is constant with respect to the mean. For 
QC use one can reparameterize the model to make 
the mean and coefficient of variation the basic 
parameters (by defining S^=log[l -l-CCP'/lOO)^] and 
a=log(/i) — SV2). In addition, the model can be 
generalized to shift its origin from zero to y by 
taking log {x -\-y) to be normally distributed. 



Figure 1 illustrates how the skewness of the dis- 
tribution increases as the CK increases. For CVs of 
less than 10%, the model differs very little in shape 
from the normal distribution. (This is another rea- 
son for the usefulness of the normal model.) 

Based on the Central Limit Theorem and the 
fact that the logarithm of a product of random 
variables equals the sum of the logarithms of the 
random variables, the lognormal model tends to be 
appropriate for multiplicative processes. Figure 2 
shows how rapidly the distribution of a product of 
random variables can approach the lognormal dis- 
tribution as the number of factors (n) increases. 

The third alternative, no distribution, is the most 
likely model in the absence of effective quality con- 
trol. The first job before applying statistical models 
and procedures is to get a distribution. "Stability, 
or the existence of a system, is seldom a natural 
state. It is an achievement ..." [3]. Producing a 
consistent distribution requires serious effort from 
design (method development) through production 
(routine analysis). 



3. Choosing the Right Model 

One way to decide what statistical model to use 
is to turn to the technical literature for guidance. 
However, many sources do not address the subject 
explicitly (e.g., see the ACS Principles of Environ- 
mental Analysis [4]), and others give conflicting 
advice. For example, the Statistical Manual of the 
AOAC [5] says: "It is understood that random er- 
rors are equally likely to be positive or negative 
and to vary in size in a manner that is adequately 
described by the normal law of errors." 
Eckschlager and Stepanek [6] say that a shifted 
lognormal distribution is appropriate for concen- 
trations above the determination limit. Thompson 
and Howarth [7] claim to make the "case against" 
the lognormal distribution. 

A second approach to choosing a model is to 
look at data. Unfortunately, there is seldom enough 
data to reach a conclusion [7], or the data is messy 
(contains blunders and outliers and is censored be- 
low the limit of quantitation). Nevertheless, one 
can find clues in QC data; for example, appropri- 
ateness of CV and percent recovery as summary 
statistics hint at a multiplicative process and log- 
normality. The work of Horwitz and colleagues in 
relating analytical precision to concentration 
across many analytical methods is interesting in 
this regard: it shows the widespread usefulness of 
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the CV as a measure of imprecision, and shows 
how the total CV increases as the level of appli- 
cability of methods decreases [8]. At the ppb level, 
the Horwitz model gives a CVof about 45%. 

The third and best method of choosing a model 
is to combine whatever knowledge can be obtained 
from data with what caa be deduced from the na- 
ture of the measurement process. Is the process ad- 
ditive or multiplicative? There are certain common 
steps in sample preparation, such as concentration, 
dilution and extraction, that are multiplicative in 
nature [9,10]- An example of an analytical method 
with a multipUcative quantitation process is the 
GC/MS method for wastewater analysis [11]: the 
test result is a product or quotient of response fac- 
tor, concentration of internal standard, peak areas, 
and volume of original sample. Coupled with the 
fact that CVs for this method are as high as 50%, 
there is good reason to expect the lognormal model 
to be appropriate. 



4. Impact of the Lognormal Model 



symmetric, so applying such tests to lognormal 
data tends to give erroneous results. For example, 
when Grubbs test is applied to untransformed log- 
normal data, there is a tendency to miss real lower- 
tail outliers and to find too many upper-tail 
"outliers." The problem grows as the Cp^ increases, 
but it is easily cured by applying the test to log- 
transformed data. 

In conclusion, the lognormal distribution appears 
to be the appropriate model for some methods of 
trace analysis. As the CV increases, it becomes 
more important to use this model when it is appro- 
priate; its use does not require unmanageable 
changes in analytical QC practices. 
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A legitimate concern with the lognormal model 
is how it will affect traditional concepts and proce- 
dures of analytical QC. The answer is that the 
impact ranges from negligible to serious depending 
on the procedure involved. Consider these 
examples: 

(1) Repeatability interval. The distribution of 
the difference of two identically distributed log- 
normal random variables is well approximated by 
the normal distribution. Therefore, the impact of 
lognormality on this concept is negligible. 

(2) Control charts. It is not necessary to work 
with log-transformed data to control either bias or 
precision. For example, bias can be monitored with 
a percent recovery control chart [12], and within- 
laboratory precision by a chart for the ratio of du- 
plicate measurements. It becomes more important 
to base control limits on the lognormal rather than 
the normal model the farther the CV gets above 
10%. 

(3) Youden two-sample chart. Figure 3 shows 
the type of patterns to expect from lognormal data 
for two different degrees of between-laboratory 
variation. One should expect fan-shaped patterns 
with points concentrated hi the first quadrant, 
rather than the more balanced elliptical patterns 
characteristic of the normal distribution [5], 

(4) Outlier tests. The commonly used outlier 
tests are based on the normal distribution, which is 
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Figure 1, Lognormal distributions. 
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Figure 3. Expected configurations of Youden two-sample 
charts for lognormal data. 



The Reactor Laboratory carries out analytical 
service by using neutron activation analysis. Alto- 
gether 50 elements are analyzed within a wide 
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